Pigeons (Columba livia) as Trainable Observers of Pathology and Radiology Breast Cancer Images

Richard M. Levenson, Elizabeth A. Krupinski, Victor M. Navarro, Edward A. Wasserman
PLOS ONE
Department of Pathology and Laboratory Medicine, University of California Davis Medical Center

First Page Preview

First page preview

Table of Contents

Overall Summary

Study Background and Main Findings

This research explored the potential of using pigeons (Columba livia) as 'surrogate observers' for evaluating medical images, a novel approach motivated by the high cost and time investment required for human expert validation. The study investigated whether pigeons, through operant conditioning with food rewards, could learn to discriminate between benign and malignant examples in histopathology and radiology images. The researchers addressed four key questions: the pigeons' basic trainability, their ability to generalize beyond memorization, their performance limits on difficult tasks, and the practical utility of their skills.

The study involved a series of experiments using a custom-built operant conditioning chamber where pigeons interacted with a touchscreen. In Experiment 1, pigeons were trained to classify breast histopathology images at different magnifications. To assess generalization, they were tested on novel image sets they hadn't seen during training. Experiments 2 and 3 focused on radiology tasks: detecting microcalcifications and classifying mammographic masses, respectively. The researchers also manipulated image properties, such as color, luminance, and compression, to investigate the visual cues used by the pigeons.

The results showed that pigeons could successfully learn to classify histopathology images with high accuracy (around 85%) and, importantly, generalize this skill to new images. A novel 'flock-sourcing' method, combining judgments from multiple pigeons, achieved even higher accuracy (99%). The pigeons also performed well in detecting microcalcifications on mammograms. However, they struggled with the more complex task of classifying mammographic masses, demonstrating an inability to generalize beyond the training set. Manipulating image properties revealed that color and luminance aided performance but weren't essential, and that pigeons could adapt to compressed images with further training.

The researchers concluded that pigeons can serve as a viable model for studying certain aspects of medical image perception, particularly for tasks involving visual discrimination. Their successes and failures mirrored the relative difficulty of these tasks for humans, suggesting that pigeons could be a cost-effective and controllable alternative to human observers for evaluating image quality and the impact of image processing techniques. The study also highlighted the potential of 'flock-sourcing' as a method for enhancing diagnostic accuracy. However, further research is needed to understand the specific visual features and strategies used by the pigeons.

Research Impact and Future Directions

This study demonstrates the remarkable ability of pigeons to learn complex visual discriminations in medical images, achieving high accuracy in histopathology and microcalcification detection. The research is notable not only for its novelty but also for its rigorous methodology, including controls for memorization and systematic manipulation of image parameters. The 'flock-sourcing' approach, where the judgments of multiple birds are combined, is particularly innovative and yielded impressive results, highlighting the potential of collective intelligence even in relatively simple animal models.

While the pigeons' failure to generalize on the mammogram mass classification task underscores the limits of their perceptual abilities, this limitation actually strengthens the model's validity. By mirroring the challenges faced by human experts, the pigeons provide a valuable tool for understanding the perceptual demands of medical image interpretation. The study's findings have practical implications for image quality assessment, potentially offering a cost-effective and controllable alternative to human observers for evaluating the impact of image processing techniques and display parameters.

The study's primary limitation lies in its focus on visual discrimination. While pigeons can clearly learn to distinguish between image categories, the study doesn't reveal how they achieve this. Further research is needed to understand the specific visual features and strategies the pigeons use. Exploring these mechanisms would not only deepen our understanding of avian visual processing but could also provide valuable insights for developing more effective training methods for human experts and for improving the design of computer-aided diagnostic tools. Despite this limitation, the study's innovative approach and rigorous methodology establish a strong foundation for future research in comparative visual cognition and its application to medical imaging.

Critical Analysis and Recommendations

Comprehensive and Balanced Summary (written-content)
The abstract provides a comprehensive overview of the study's key elements, including the novel use of pigeons, their successes and failures on different tasks, and the broader implications. This clear and concise summary effectively communicates the research's significance to a broad audience.
Section: Abstract
Connect Findings to Computational Models (written-content)
The abstract could be strengthened by explicitly connecting the pigeon model to the development and validation of computational models in medical imaging. This would enhance the paper's relevance to a key area of current research.
Section: Abstract
Clear Problem Framing and Motivation (written-content)
The introduction effectively frames the problem of human expertise in medical imaging being expensive and time-consuming, motivating the need for alternative approaches. This clear problem statement immediately establishes the study's relevance.
Section: Introduction
Frame Pigeons as Benchmark for AI (written-content)
The introduction could be improved by explicitly positioning the pigeon model as a biological benchmark for AI in medical imaging. This would strengthen the paper's connection to a major contemporary challenge.
Section: Introduction
Detailed and Reproducible Training Regimen (written-content)
The detailed description of the operant conditioning protocol, including trial structure and reinforcement schedules, ensures transparency and reproducibility, which are essential for scientific rigor.
Section: Materials and Methods
Quantify Manual Image Adjustments (written-content)
The methods section lacks quantitative details about manual image adjustments. Specifying target parameters for brightness and contrast would improve reproducibility and eliminate subjectivity.
Section: Materials and Methods
Visualizing Histopathology Stimuli (graphical-figure)
Figure 2 effectively presents the histopathology stimuli, allowing readers to visually assess the discrimination task. However, adding annotations to highlight key diagnostic features would improve clarity for non-experts.
Section: Materials and Methods
Visualizing Mass Classification Difficulty (graphical-figure)
Figure 5 effectively conveys the difficulty of the mass classification task. However, the lack of annotations makes it challenging for non-experts to discern the subtle features that distinguish benign from malignant masses.
Section: Materials and Methods
Evidence for Generalization (written-content)
The results clearly demonstrate the pigeons' ability to generalize learned concepts to novel histopathology images, providing strong evidence for true learning rather than memorization. This generalization is a key finding that supports the study's main claims.
Section: Results
Visualize Flock-Sourcing Dynamics (written-content)
The 'flock-sourcing' analysis revealed significantly improved accuracy, but the mechanism remains unclear. A more granular analysis visualizing how individual errors are corrected in the group would strengthen this novel finding.
Section: Results
Coherent Synthesis of Results (written-content)
The discussion effectively synthesizes the findings from multiple experiments, providing a cohesive narrative that integrates successes and failures. This comprehensive interpretation strengthens the study's overall impact.
Section: Discussion
Frame Pigeons as a Dynamic Benchmark for AI (written-content)
The discussion could be enhanced by explicitly proposing the pigeon model as a dynamic benchmark for AI development in medical imaging. This would highlight the model's potential beyond a source of inspiration.
Section: Discussion

Section Analysis

Abstract

Key Aspects

Strengths

Suggestions for Improvement

Introduction

Key Aspects

Strengths

Suggestions for Improvement

Materials and Methods

Key Aspects

Strengths

Suggestions for Improvement

Non-Text Elements

Fig 1. The pigeons' training environment. The operant conditioning chamber was...
Full Caption

Fig 1. The pigeons' training environment. The operant conditioning chamber was equipped with a food pellet dispenser, and a touch-sensitive screen upon which the medical image (center) and choice buttons (blue and yellow rectangles) were presented.

Figure/Table Image (Page 4)
Fig 1. The pigeons' training environment. The operant conditioning chamber was equipped with a food pellet dispenser, and a touch-sensitive screen upon which the medical image (center) and choice buttons (blue and yellow rectangles) were presented.
First Reference in Text
The chambers (shown in Fig 1) measured 36 cm × 36 cm x 41 cm and were located in a dark room with continuous white noise played during sessions.
Description
  • Image Content and Organization: The figure presents a grid of 18 histopathology images, which are microscopic views of tissue samples. These are specifically from human breast tissue specimens that have been categorized as either 'benign' (non-cancerous) on the left or 'malignant' (cancerous) on the right.
  • Hematoxylin and Eosin (H&E) Staining: All specimens are stained with hematoxylin and eosin (H&E), a standard staining method in pathology. Hematoxylin stains cell nuclei a purplish-blue color, highlighting the cell's control center, while eosin stains other structures like cytoplasm and connective tissue in various shades of pink. This color contrast makes the tissue's architecture and cellular details visible.
  • Multiple Magnification Levels: The images are shown at three different levels of magnification, arranged in rows: 4x (low power), 10x (medium power), and 20x (high power). This progression is analogous to zooming in with a camera, moving from a wide overview of the tissue landscape (4x) to a more detailed view of individual cell groups (20x). The caption notes that this sequence matches the order in which pigeons were trained.
  • Visual Characteristics of Benign vs. Malignant Tissue: Visually, the benign samples generally show more organized and well-defined structures, such as circular ducts and lobules, with more pink-staining space between them. In contrast, the malignant samples often appear more chaotic and densely packed with dark purple-staining cells, reflecting the uncontrolled cell growth characteristic of cancer. These visual differences form the basis of the discrimination task for the pigeons.
Scientific Validity
  • ✅ The figure provides crucial insight into the experimental stimuli.: Displaying examples of the actual visual stimuli is a critical component of a methods section for a visual perception study. This figure allows the reader to directly assess the nature and potential difficulty of the discrimination task, which is essential for interpreting the study's results.
  • ✅ The use of varying magnifications represents a robust experimental design.: The experimental design of training pigeons across multiple magnifications (4x, 10x, 20x) is a methodological strength. It tests whether the animals can learn to identify pathological features at different spatial scales, which mirrors a key skill used by human pathologists and adds a layer of complexity and relevance to the study.
  • 💡 The representativeness of the selected examples is not defined.: The text states these are 'representative' images. However, without information on the selection criteria, there is a potential for selection bias. Were these images chosen because they are particularly clear-cut examples, or do they reflect the average difficulty of the entire stimulus set? Acknowledging the difficulty level of these specific examples would strengthen the transparency of the methods.
  • 💡 The images lack scale bars for absolute size reference.: For scientific rigor in publishing microscopy images, a scale bar is standard practice. While magnification levels are provided, they are relative and can be affected by display size. A scale bar (e.g., 100 µm) would provide an absolute, objective measure of size within each image, which is more informative and aids in reproducibility.
Communication
  • ✅ The figure's grid layout is highly effective for comparison.: The grid layout is exceptionally clear and well-organized. By arranging the images by condition (Benign vs. Malignant) in columns and by magnification in rows, the figure allows for easy and intuitive visual comparison between the categories at each level of detail.
  • ✅ The caption is highly informative and enhances the figure's self-sufficiency.: The caption is comprehensive and makes the figure largely self-contained. It clearly identifies the tissue type, staining method, image categories, and the training sequence corresponding to the different magnifications shown. This allows readers to understand the stimuli and the experimental progression without needing to search the main text.
  • ✅ The labeling is clear and effective.: The labels for rows ('4x', '10x', '20x') and columns ('Benign samples', 'Malignant samples') are clear, legible, and appropriately placed, which is crucial for the figure's interpretability.
  • 💡 Annotating key diagnostic features would improve clarity for a broader audience.: While the images are illustrative, their educational value could be enhanced for a non-expert audience. The key visual differences that define benign versus malignant tissue (e.g., organized ductal structures vs. disorganized sheets of cells) are subtle. Suggest adding annotations like arrows or outlines to a few key examples to highlight these discriminating features, which would clarify the visual challenge presented to the pigeons.
Fig 2. Examples of benign (left) and malignant (right) breast specimens stained...
Full Caption

Fig 2. Examples of benign (left) and malignant (right) breast specimens stained with hematoxylin and eosin, at different magnifications. Pigeons were initially trained and tested with samples at 4x magnification (top row), and then were subsequently transitioned to samples at 10x magnification (center row) and 20x magnification (bottom row).

Figure/Table Image (Page 5)
Fig 2. Examples of benign (left) and malignant (right) breast specimens stained with hematoxylin and eosin, at different magnifications. Pigeons were initially trained and tested with samples at 4x magnification (top row), and then were subsequently transitioned to samples at 10x magnification (center row) and 20x magnification (bottom row).
First Reference in Text
See Fig 2 for a representative sample of images displayed to the birds.
Description
  • Image Content and Organization: The figure presents a grid of 18 histopathology images, which are microscopic views of tissue samples. These are specifically from human breast tissue specimens that have been categorized as either 'benign' (non-cancerous) on the left or 'malignant' (cancerous) on the right.
  • Hematoxylin and Eosin (H&E) Staining: All specimens are stained with hematoxylin and eosin (H&E), a standard staining method in pathology. Hematoxylin stains cell nuclei a purplish-blue color, highlighting the cell's control center, while eosin stains other structures like cytoplasm and connective tissue in various shades of pink. This color contrast makes the tissue's architecture and cellular details visible.
  • Multiple Magnification Levels: The images are shown at three different levels of magnification, arranged in rows: 4x (low power), 10x (medium power), and 20x (high power). This progression is analogous to zooming in with a camera, moving from a wide overview of the tissue landscape (4x) to a more detailed view of individual cell groups (20x). The caption notes that this sequence matches the order in which pigeons were trained.
  • Visual Characteristics of Benign vs. Malignant Tissue: Visually, the benign samples generally show more organized and well-defined structures, such as circular ducts and lobules, with more pink-staining space between them. In contrast, the malignant samples often appear more chaotic and densely packed with dark purple-staining cells, reflecting the uncontrolled cell growth characteristic of cancer. These visual differences form the basis of the discrimination task for the pigeons.
Scientific Validity
  • ✅ The figure provides crucial insight into the experimental stimuli.: Displaying examples of the actual visual stimuli is a critical component of a methods section for a visual perception study. This figure allows the reader to directly assess the nature and potential difficulty of the discrimination task, which is essential for interpreting the study's results.
  • ✅ The use of varying magnifications represents a robust experimental design.: The experimental design of training pigeons across multiple magnifications (4x, 10x, 20x) is a methodological strength. It tests whether the animals can learn to identify pathological features at different spatial scales, which mirrors a key skill used by human pathologists and adds a layer of complexity and relevance to the study.
  • 💡 The representativeness of the selected examples is not defined.: The text states these are 'representative' images. However, without information on the selection criteria, there is a potential for selection bias. Were these images chosen because they are particularly clear-cut examples, or do they reflect the average difficulty of the entire stimulus set? Acknowledging the difficulty level of these specific examples would strengthen the transparency of the methods.
  • 💡 The images lack scale bars for absolute size reference.: For scientific rigor in publishing microscopy images, a scale bar is standard practice. While magnification levels are provided, they are relative and can be affected by display size. A scale bar (e.g., 100 µm) would provide an absolute, objective measure of size within each image, which is more informative and aids in reproducibility.
Communication
  • ✅ The figure's grid layout is highly effective for comparison.: The grid layout is exceptionally clear and well-organized. By arranging the images by condition (Benign vs. Malignant) in columns and by magnification in rows, the figure allows for easy and intuitive visual comparison between the categories at each level of detail.
  • ✅ The caption is highly informative and enhances the figure's self-sufficiency.: The caption is comprehensive and makes the figure largely self-contained. It clearly identifies the tissue type, staining method, image categories, and the training sequence corresponding to the different magnifications shown. This allows readers to understand the stimuli and the experimental progression without needing to search the main text.
  • ✅ The labeling is clear and effective.: The labels for rows ('4x', '10x', '20x') and columns ('Benign samples', 'Malignant samples') are clear, legible, and appropriately placed, which is crucial for the figure's interpretability.
  • 💡 Annotating key diagnostic features would improve clarity for a broader audience.: While the images are illustrative, their educational value could be enhanced for a non-expert audience. The key visual differences that define benign versus malignant tissue (e.g., organized ductal structures vs. disorganized sheets of cells) are subtle. Suggest adding annotations like arrows or outlines to a few key examples to highlight these discriminating features, which would clarify the visual challenge presented to the pigeons.
Fig 3. Monochrome images with equated hue and brightness, at different levels...
Full Caption

Fig 3. Monochrome images with equated hue and brightness, at different levels of compression. The original images at 10x magnification were converted to grayscale, colored with a single hue, and had their overall brightness and contrast equalized as closely as possible.

Figure/Table Image (Page 7)
Fig 3. Monochrome images with equated hue and brightness, at different levels of compression. The original images at 10x magnification were converted to grayscale, colored with a single hue, and had their overall brightness and contrast equalized as closely as possible.
First Reference in Text
Monochrome stimuli. The 10x stimuli at 0° were used, but were converted to monochrome and equated in hue and brightness to eliminate those image properties as variables (see Fig 3, top row, for representative images).
Description
  • Monochrome and Equalized Images: This figure displays a grid of histopathology images that have been digitally manipulated to test which visual cues pigeons use for classification. The original 10x magnification color images were first converted to monochrome (single color) by making them grayscale and then applying a uniform purplish hue. This process, known as pseudocoloring, removes color differences as a variable. The caption states that brightness and contrast were also adjusted to be as similar as possible across all images.
  • Levels of Image Compression: The figure's main purpose is to show the effects of image compression, a method for reducing a digital file's size, which can degrade image quality. The rows represent three different levels of this compression. The top row, labeled '1:1', shows the baseline uncompressed images. The middle row ('15:1') and bottom row ('27:1') show the same images after being compressed to be 15 and 27 times smaller, respectively. This compression introduces visible distortions, known as artifacts, such as blockiness and a loss of fine detail, which are more severe in the bottom row.
  • Benign vs. Malignant Comparison: Similar to the previous figure, the images are separated into columns of 'Benign samples' (non-cancerous) and 'Malignant samples' (cancerous). This layout allows for a side-by-side comparison to see how the features that distinguish these two conditions are affected by the removal of color cues and the introduction of compression artifacts.
Scientific Validity
  • ✅ The image manipulation represents a robust experimental control.: The systematic removal of color and normalization of brightness/contrast is a strong experimental control. This manipulation allows the researchers to isolate the importance of morphological and textural information for the discrimination task, providing a more rigorous test of what the pigeons are actually learning.
  • ✅ The investigation of compression artifacts adds practical relevance to the study.: Testing the effect of image compression is highly relevant to the field of digital pathology, where managing large file sizes is a practical challenge. By assessing how performance changes with compressed images, the study explores the practical utility of using pigeons as 'surrogate observers' for tasks involving real-world image quality issues.
  • 💡 The subjective description of image equalization lacks quantitative support.: The caption describes the brightness and contrast equalization as being done 'as closely as possible,' which is a subjective statement. For greater methodological rigor, the authors should provide quantitative data (e.g., mean luminance, pixel intensity standard deviation) for the benign and malignant image sets to objectively demonstrate how successful the equalization process was.
  • 💡 The images are missing standard scale bars for absolute size reference.: As with previous figures, these microscopy images lack scale bars. While the 10x magnification is stated, an absolute scale bar (e.g., in micrometers) is the standard for scientific publication. It would provide an objective measure of the size of cellular structures and help in assessing the impact of compression on features of a specific size.
Communication
  • ✅ The figure's organization effectively communicates the experimental variables.: The grid layout is highly effective. By organizing images by diagnosis (columns) and compression level (rows), the figure allows for an intuitive and direct comparison of how compression artifacts affect the visibility of features in both benign and malignant tissues.
  • ✅ The visualization of compression artifacts is clear and impactful.: The figure successfully visualizes the abstract concept of image compression. The progressive degradation of image quality from the top row (uncompressed) to the bottom row (heavily compressed) is immediately obvious, clearly illustrating the visual challenge being tested.
  • ✅ The labeling is clear and effective.: The labels for the rows ('1:1', '15:1', '27:1') and columns ('Benign samples', 'Malignant samples') are clear and well-placed. The caption provides the necessary context to understand these labels.
  • 💡 The process of 'equalization' could be more quantitatively described or visualized.: The caption states that brightness and contrast were 'equalized as closely as possible'. This is a subjective description. To improve clarity and rigor, it would be beneficial to add a supplementary figure or data showing the luminance histograms for the benign and malignant image sets to quantitatively demonstrate the degree of equalization achieved.
Fig 4. Mammograms with the absence (left) and with presence (right) of...
Full Caption

Fig 4. Mammograms with the absence (left) and with presence (right) of microcalcifications. Yellow circles denote where microcalcifications are located.

Figure/Table Image (Page 9)
Fig 4. Mammograms with the absence (left) and with presence (right) of microcalcifications. Yellow circles denote where microcalcifications are located.
First Reference in Text
A total of 40 regions of interest were cropped from anonymized mammograms approved for research use by the University of Arizona IRB: 20 containing subtle clusters of microcalcifications plus 20 examples without clusters (see Fig 4 for representative images; a complete image set is available in the S2 File included in the Supporting Information).
Description
  • Mammogram Image Stimuli: The figure displays a set of mammograms, which are grayscale X-ray images used to examine breast tissue. The images are divided into two groups: those on the left show breast tissue with 'No calcifications,' while those on the right show tissue with the 'presence' of microcalcifications.
  • Microcalcifications as Visual Targets: Microcalcifications are tiny deposits of calcium that appear as small, bright white specks on a mammogram. They can sometimes be an early indicator of breast cancer. As shown in the figure, these specks are very subtle and can be difficult to distinguish from the complex, cloudy background texture of the normal breast tissue.
  • Annotation for Clarity: To help the viewer locate these difficult-to-see targets, the scientists have added yellow circles to the images on the right, highlighting the areas where the microcalcification clusters are located. These circles were for the benefit of the reader and were not shown to the pigeons during the experiment.
  • Representative Experimental Set: These images are presented as representative examples from a larger set used in the experiment. According to the reference text, the full stimulus set consisted of 40 images in total: 20 containing these subtle microcalcification clusters and 20 without them.
Scientific Validity
  • ✅ Displaying the experimental stimuli is a methodological strength.: Showing examples of the actual stimuli is critical for a visual perception study. This figure allows the scientific audience to directly assess the difficulty and nature of the task, which is essential for interpreting the pigeons' performance data.
  • ✅ The chosen task has high clinical relevance.: The task of detecting microcalcifications on mammograms is a clinically relevant and often challenging problem for human radiologists. Using these stimuli makes the study's findings more interesting and potentially applicable to understanding medical image perception, beyond simple pattern recognition.
  • 💡 The difficulty level of the 'representative' images is not specified.: The reference text states that the images shown are 'representative'. While the text later clarifies that the full sets were balanced for difficulty using human radiologist scores (a strong methodological choice), it is not specified whether the examples in this figure are of low, average, or high difficulty. This information would provide better context for the visual evidence presented.
  • ✅ The use of cropped regions of interest creates a well-controlled task.: The reference text confirms that these images are cropped 'regions of interest' from full mammograms. This is an important methodological detail, as it means the pigeons did not have to perform a search task across a large image but rather a detection/classification task on a pre-selected area. This simplifies the task and focuses the experiment on feature recognition, which is a valid and well-controlled design choice.
Communication
  • ✅ The comparative layout is clear and intuitive.: The side-by-side layout comparing images with the target feature ('Calcifications') to those without ('No calcifications') is a simple and highly effective way to present the visual stimuli. It immediately clarifies the nature of the discrimination task.
  • ✅ The annotation with yellow circles is highly effective for guiding the reader.: The use of yellow circles to highlight the microcalcifications is an excellent communication strategy. Given that the targets are extremely subtle, these annotations are essential for the reader to quickly and reliably identify the features of interest, making the figure's point about task difficulty very effectively.
  • ✅ The caption is clear and informative.: The caption is concise and accurately describes the figure's content. It clearly states what is being shown and explains the purpose of the annotations, making the figure largely self-contained.
  • 💡 The annotations, while useful, partially obscure the target features.: While the yellow circles are helpful, they are quite large and can obscure the texture of the tissue immediately surrounding the microcalcifications. For one or two examples, consider using arrows pointing to the features instead of an enclosing circle. This would allow the reader to better appreciate the subtlety of the target against its direct background.
Fig 5. Examples of benign (left) and malignant (right) masses in mammograms....
Full Caption

Fig 5. Examples of benign (left) and malignant (right) masses in mammograms. Subsequent biopsy established histopathology ground-truth.

Figure/Table Image (Page 9)
Fig 5. Examples of benign (left) and malignant (right) masses in mammograms. Subsequent biopsy established histopathology ground-truth.
First Reference in Text
A total of 40 region-of-interest images cropped from anonymized mammograms approved for research use by the University of Arizona IRB, consisting of 20 samples with malignant masses and 20 samples with benign masses were used (see Fig 5 for representative images).
Description
  • Mammogram Images of Breast Masses: The figure displays a grid of 12 images taken from mammograms, which are a type of X-ray used for breast cancer screening. These images focus on 'masses,' which are areas of tissue that appear denser or different from the surrounding tissue. The images are categorized into 'benign' (non-cancerous) masses on the left and 'malignant' (cancerous) masses on the right.
  • Biopsy-Confirmed Ground Truth: The caption crucially states that 'subsequent biopsy established histopathology ground-truth.' This means that after the X-ray was taken, a small tissue sample (a biopsy) was physically removed from the mass and examined under a microscope by a pathologist. This microscopic analysis provides the definitive, 'ground-truth' diagnosis of whether the mass was actually benign or malignant, ensuring the images were correctly labeled for the experiment.
  • Subtle Visual Distinctions: The visual differences between the two categories are extremely subtle. In medical practice, radiologists look for clues in the shape and border of the mass; for example, malignant masses often have irregular, fuzzy, or spiky ('spiculated') edges, while benign masses tend to be smoother and more rounded. However, these characteristics are very difficult to discern in the provided examples, highlighting the significant challenge of this classification task.
Scientific Validity
  • ✅ The use of biopsy-confirmed ground truth is the gold standard for this type of study.: The use of biopsy-confirmed 'ground-truth' is a major methodological strength. It ensures that the labels for 'benign' and 'malignant' are unequivocally correct, which is essential for training and testing a classification model, whether it's an animal or a computer algorithm.
  • ✅ The experimental task has high clinical relevance.: The task of differentiating benign from malignant masses on mammograms is a core challenge in clinical radiology. Using these stimuli makes the experiment highly relevant to real-world medical image perception and provides a strong test case for the limits of the pigeons' visual abilities.
  • ✅ The use of cropped regions of interest represents a well-controlled experimental design.: The text confirms that these are cropped 'regions of interest.' This is a sound experimental control, as it isolates the classification task from a visual search task (finding the mass on a full mammogram). This allows the researchers to focus specifically on the ability to discriminate features.
  • 💡 The difficulty level of the selected 'representative' images is not defined.: The figure shows 'representative images,' but the criteria for their selection are not mentioned here. The text later clarifies that the full image sets were balanced for difficulty based on human radiologist performance. It would strengthen this figure to state whether these specific examples are of average, low, or high difficulty to provide better context for the visual evidence of the task's challenge.
Communication
  • ✅ The comparative layout is clear and effective.: The simple side-by-side layout, with benign examples on the left and malignant on the right, is a clear and effective way to organize the visual stimuli for comparison.
  • ✅ The figure effectively demonstrates the visual difficulty of the task.: The figure powerfully communicates the extreme difficulty of the task. The visual differences between the benign and malignant masses are incredibly subtle, which effectively primes the reader to understand why this task was challenging for the pigeons, as discussed later in the results.
  • 💡 The absence of annotations makes it difficult to discern the relevant features.: Unlike in Figure 4, there are no annotations to guide the viewer. Because the distinguishing features (e.g., the shape of the mass margins) are so subtle, the figure fails to educate the non-expert reader on what visual cues are relevant. Suggest adding outlines or arrows to highlight the borders of the masses in a few examples to clarify the specific visual challenge.

Results

Key Aspects

Strengths

Suggestions for Improvement

Non-Text Elements

Fig 6. Results of training with breast histopathology samples at different...
Full Caption

Fig 6. Results of training with breast histopathology samples at different magnifications and rotations. A) When first trained with 4x magnification images the birds performed at chance levels of accuracy, but quickly learned to discriminate.

Figure/Table Image (Page 11)
Fig 6. Results of training with breast histopathology samples at different magnifications and rotations. A) When first trained with 4x magnification images the birds performed at chance levels of accuracy, but quickly learned to discriminate.
First Reference in Text
Remarkably, the pigeons rapidly learned to discriminate the appearance of benign from malignant breast tissue histology with high accuracy (Fig 6A), correct choice responses levels rising from 50% at the outset (i.e., at chance level) to 85% over 15 days of training.
Description
  • Learning Curve Over Time: This line graph (Panel A) illustrates the learning progress of pigeons over 15 consecutive days of training. The vertical y-axis, 'Percent correct,' shows the accuracy of the pigeons' choices, ranging from 40% to 100%. The horizontal x-axis represents the training 'Day.'
  • Comparison Across Magnifications: The graph displays three separate lines, each representing a different magnification level of the pathology images the pigeons were shown: 4x (lowest zoom), 10x, and 20x (highest zoom). This allows for a comparison of how quickly the pigeons learned to classify images at different levels of detail.
  • Performance Improvement from Chance: The data for the initial 4x training shows a clear learning curve. According to the reference text, the pigeons started at an accuracy of 50%, which is the 'chance level'—the score expected from random guessing in a two-choice task. Over 15 days, their performance steadily improved to approximately 85% accuracy.
  • Evidence of Knowledge Transfer: When subsequently trained on higher magnifications (10x and 20x), the pigeons' starting accuracy was already well above 50% (around 60-70%). This indicates that they were able to transfer some of the knowledge gained from the lower magnification images to the new, more detailed images.
  • Indication of Performance Variability: The small vertical lines (error bars) at each data point represent the variability in performance among the group of pigeons being tested. Shorter bars indicate that the pigeons performed more similarly to one another, while longer bars suggest a wider range of individual accuracies.
Scientific Validity
  • ✅ The graph provides strong evidence for the authors' claims about learning.: The data presented in the graph strongly supports the central claim made in the reference text and caption: pigeons rapidly learned to discriminate the images, with accuracy rising from chance (50%) to a high level (~85%). The visual evidence of the learning curve is clear and compelling.
  • ✅ The inclusion of error bars indicates statistical rigor.: The inclusion of error bars is good scientific practice, as it provides an indication of the variance within the group of subjects. This is crucial for understanding the reliability and consistency of the observed learning effect.
  • ✅ The data visualization effectively shows evidence of knowledge transfer.: The graph not only shows the primary learning curve at 4x but also demonstrates knowledge transfer to higher magnifications. The fact that pigeons started the 10x and 20x tasks with above-chance accuracy is a significant finding that suggests generalization of learned features, which is well-visualized in the plot.
  • 💡 The number of subjects (n) is not reported in the figure.: The number of pigeons (n) used to generate these averages and error bars is not stated in the figure caption or legend. This is a critical piece of information for a reader to fully evaluate the statistical power and generalizability of the findings. This information should always be included in the caption.
  • 💡 The statistical significance of the learning trend is not indicated on the graph itself.: The reference text mentions that the rise in performance was 'statistically significant, p = 0.001'. While this information is provided in the text, it is best practice for the figure to be as self-contained as possible. The authors could consider adding an asterisk or other symbol to the graph to denote the significance of the learning trend, with an explanation in the caption.
Communication
  • ✅ The choice of a line graph is highly appropriate for showing learning over time.: Using a line graph is the ideal choice for visualizing performance data over time, as it clearly illustrates the learning trend. The upward slope of the lines effectively communicates the acquisition of the discrimination skill.
  • ✅ The 'chance level' reference line is a strong visual aid.: The inclusion of a dotted line at 50% provides an excellent visual benchmark for 'chance level' performance. This makes it immediately obvious to the reader when the pigeons' accuracy surpassed random guessing.
  • ✅ The graph's labels and legend are clear and informative.: The axes and legend are clearly labeled, allowing the reader to understand what is being measured (Percent correct vs. Day) and to distinguish between the different magnification conditions (4x, 10x, 20x).
  • 💡 Using distinct colors for each data series would improve visual clarity.: While the symbols in the legend are distinct, all three data series are plotted in black. Using different colors for each magnification level (e.g., blue for 4x, green for 10x, red for 20x) would enhance the visual separation between the learning curves and make the graph easier to interpret at a glance.
Fig 7. Generalization from training to test image sets. After training with...
Full Caption

Fig 7. Generalization from training to test image sets. After training with differential reinforcement, the birds successfully classified previously unseen breast tissue images in the testing sets, at all magnifications, with no statistically significant decrease in accuracy compared to training-set performance.

Figure/Table Image (Page 11)
Fig 7. Generalization from training to test image sets. After training with differential reinforcement, the birds successfully classified previously unseen breast tissue images in the testing sets, at all magnifications, with no statistically significant decrease in accuracy compared to training-set performance.
First Reference in Text
Accordingly, during a 5-day period after the end of training at each magnification level, pigeons were given a small number of novel benign and malignant breast tissue images intermixed with the full set of familiar training images.
Description
  • Bar Chart Comparing Performance: This figure presents a bar chart that compares the performance of pigeons on two different sets of images after they have been trained. The vertical axis shows the 'Percent correct' (accuracy), while the horizontal axis shows the three different image magnification levels tested: 4x, 10x, and 20x.
  • Testing for Generalization vs. Memorization: The key comparison is between the two bars at each magnification. The darker 'Training' bar represents the pigeons' accuracy on images they had seen many times before. The lighter 'Testing' bar represents their accuracy on brand-new images they had never seen. This is a critical test of generalization—whether the birds learned a general rule (e.g., what cancer 'looks like') that they could apply to novel examples, rather than just memorizing the old ones.
  • High Accuracy on Both Familiar and Novel Images: Across all three magnifications, the pigeons' performance was very high, with accuracy for the familiar 'Training' images hovering around 85-88%. Crucially, the accuracy for the novel 'Testing' images was nearly identical, also around 85%. The caption and reference text confirm there was no statistically significant difference between the two conditions.
  • Indication of Performance Variability: The small vertical lines on top of each bar are error bars, which indicate the amount of variation in performance among the individual pigeons. The small size of these bars suggests that the high level of performance was consistent across the group of birds.
Scientific Validity
  • ✅ The figure provides powerful evidence for generalization, a cornerstone of learning.: This experiment provides the most critical piece of evidence in the study for genuine learning. By testing the pigeons on novel stimuli, the authors can distinguish between rote memorization and the generalization of a learned concept. The results shown here strongly support the conclusion that the pigeons learned to identify general features of malignant vs. benign tissue.
  • ✅ The visual evidence strongly supports the paper's central claim.: The data presented in the graph directly and strongly supports the main claim in the caption and reference text: that there was no significant drop in performance when pigeons were faced with novel images. The near-identical heights of the training and testing bars make this conclusion visually compelling.
  • ✅ The consistency of the effect across different magnifications enhances the robustness of the findings.: Demonstrating that this powerful generalization effect holds true across all three magnification levels (4x, 10x, and 20x) significantly strengthens the study's findings. It shows that the learned skill is robust and not limited to a specific level of image detail.
  • 💡 The figure should report the number of subjects (n) and the specific statistical results for the comparisons.: The caption or legend should state the number of subjects (n) whose data is represented in the averages. Furthermore, while the caption states the difference is not statistically significant, it is best practice to include the results of the statistical tests directly on the figure (e.g., by placing 'ns' for 'not significant' above the compared bars) to make it more self-contained.
Communication
  • ✅ The choice of a grouped bar chart is highly effective for the comparison.: The use of a grouped bar chart is an excellent choice for this data. It allows for a direct and intuitive visual comparison between performance on familiar 'Training' images and novel 'Testing' images within each magnification category.
  • ✅ The graph is clearly labeled.: The legend and axes are clearly labeled, and the categories on the x-axis (4x, 10x, 20x) are distinct. This fundamental clarity makes the graph easy to interpret.
  • ✅ The 'chance level' reference line is a strong visual aid.: The inclusion of a dotted line at the 50% mark provides a crucial visual reference for chance-level performance. This immediately communicates to the reader that the pigeons' accuracy in all conditions was substantially better than random guessing.
  • 💡 Using distinct colors instead of shades of gray would enhance visual clarity.: While the two shades of gray are distinguishable, using two distinct, colorblind-friendly colors (e.g., blue for Training, orange for Testing) would improve visual separation and make the graph more immediately accessible and visually engaging.
Fig 8. Training and testing with hue- and brightness-normalized breast...
Full Caption

Fig 8. Training and testing with hue- and brightness-normalized breast histology images. A) The pigeons were able to learn discrimination without the benefit of hue and brightness cues. B) However, the lack of these cues diminished the birds' ability to generalize to new images; compared to an equivalent test of full-color exemplars (see Fig 7), the pigeons performed significantly more poorly, although still well above chance levels.

Figure/Table Image (Page 12)
Fig 8. Training and testing with hue- and brightness-normalized breast histology images. A) The pigeons were able to learn discrimination without the benefit of hue and brightness cues. B) However, the lack of these cues diminished the birds' ability to generalize to new images; compared to an equivalent test of full-color exemplars (see Fig 7), the pigeons performed significantly more poorly, although still well above chance levels.
First Reference in Text
Pigeons were exposed to monochrome, hue-normalized benign and malignant breast images at 10× magnification and achieved high levels of accuracy over 15 days of training (Fig 8A), which again proved to be unaffected during image rotation trials (not shown).
Description
  • Learning Curve with Monochrome Images: This line graph (Panel A) shows the learning curve for pigeons trained on monochrome images, where color and brightness differences between images were removed. The y-axis ('Percent correct') tracks accuracy, while the x-axis tracks the training 'Day' over a 15-day period.
  • Successful Learning Without Color Cues: The graph shows that the pigeons' performance starts near the 50% chance level (random guessing) on Day 1 and steadily increases to a high level of accuracy, reaching approximately 85% by Day 15. This demonstrates that pigeons can learn to distinguish benign from malignant tissue based on texture and shape cues alone, without relying on color.
  • Indication of Performance Variability: The small vertical error bars on each data point represent the variability in performance across the group of pigeons. The relatively small size of these bars indicates that the learning was consistent among the subjects.
Scientific Validity
  • ✅ The experimental design provides a powerful control for isolating key visual features.: This experiment provides a crucial control. By removing color and brightness cues and showing that the pigeons can still learn the task, the authors effectively demonstrate that the discrimination is based on more complex features like morphology and texture, not simple color differences. This significantly strengthens their overall conclusion.
  • ✅ The data strongly supports the authors' claim.: The data presented in the graph provides clear and direct support for the claim in the caption and reference text: pigeons successfully learned the discrimination task even with monochrome, normalized images.
  • 💡 The number of subjects (n) is not reported in the figure.: As with previous figures, the number of subjects (n) used to calculate the average performance and error bars is not stated in the caption or legend. This information is essential for a reader to fully assess the statistical power and reliability of the results and should be included.
Communication
  • ✅ The choice of graph type is appropriate.: The use of a line graph is the correct choice to show performance changes over time, effectively visualizing the learning process.
  • ✅ The 'chance level' reference line is a strong visual aid.: The dotted line indicating the 50% chance level provides an immediate and clear benchmark for the reader to assess the pigeons' performance, making it obvious that they learned the task successfully.
  • ✅ The graph is clear and easy to read.: The graph is clean and uncluttered, with clearly labeled axes, which aids in its readability and interpretation.
Fig 9. Flock sourcing. A "flock-sourcing" score was calculated by summating the...
Full Caption

Fig 9. Flock sourcing. A "flock-sourcing" score was calculated by summating the responses of individual birds as described in the text. Pooling the birds' decisions led to significantly better discrimination than that achieved by individual pigeons.

Figure/Table Image (Page 13)
Fig 9. Flock sourcing. A "flock-sourcing" score was calculated by summating the responses of individual birds as described in the text. Pooling the birds' decisions led to significantly better discrimination than that achieved by individual pigeons.
First Reference in Text
As Fig 9 shows, even though every bird discriminated well above chance level (areas under the curve: 0.85, 0.81, 0.79, 0.73 for individual pigeons, significantly different from chance), individual bird performance was surpassed by the flock score; indeed, the area under the curve for the flock was 0.99.
Description
  • Receiver Operating Characteristic (ROC) Curve: This figure displays a set of Receiver Operating Characteristic (ROC) curves, which are a standard way to evaluate the performance of a classification test. An ROC curve plots a classifier's ability to correctly identify positive cases (sensitivity) against its tendency to incorrectly identify negative cases as positive (false positive rate). A curve that bows further towards the top-left corner indicates a better-performing classifier.
  • Individual vs. 'Flock' Performance: The graph compares the performance of four individual pigeons (labeled 29B, 28Y, 71R, 45W) with the combined performance of the group, termed 'Flock sourcing'. The flock's decision on an image was determined by summing the individual 'malignant' judgments.
  • Performance Levels: The dotted diagonal line represents a classifier with no skill, equivalent to random guessing. All individual pigeon curves are well above this line, indicating skillful discrimination. The 'Flock' curve is positioned highest of all, very close to the top-left corner, signifying near-perfect classification.
  • Area Under the Curve (AUC) Data: The performance of an ROC curve is summarized by the Area Under the Curve (AUC). An AUC of 0.5 represents chance, and 1.0 represents a perfect classifier. The reference text provides the AUC values: the 'Flock' achieved an exceptional AUC of 0.99, while the individual pigeons scored AUCs of 0.85, 0.81, 0.79, and 0.73, all of which are good but clearly inferior to the collective.
Scientific Validity
  • ✅ The use of ROC analysis is a highly rigorous and appropriate method.: The use of ROC analysis is the gold standard for evaluating and comparing the performance of binary classifiers. This represents a highly rigorous and appropriate methodological choice for this type of data.
  • ✅ The 'flock sourcing' analysis provides a novel and significant insight.: The concept of 'flock sourcing' is a novel and insightful way to analyze the data. It demonstrates a principle of collective intelligence ('wisdom of the crowd') in an animal model, showing that pooling imperfect judgments can lead to a highly accurate consensus. This is a significant finding of the study.
  • ✅ The data provides very strong support for the authors' conclusion.: The visual evidence in the graph, combined with the quantitative AUC values and p-values reported in the reference text, provides exceptionally strong support for the paper's conclusion that the pooled 'flock' performance is statistically superior to that of any individual bird.
Communication
  • ✅ The figure's primary message is communicated with excellent visual clarity.: The graph effectively communicates its central message. The clear visual separation between the top 'Flock' curve and the lower individual pigeon curves instantly conveys that the collective judgment is superior to any single individual's judgment.
  • ✅ The visual design effectively emphasizes the key result.: The use of a bold black line for the main result ('Flock') and different colored lines for the individual data creates a strong visual hierarchy, guiding the reader's attention to the most important finding.
  • ✅ The legend and reference line are clear and follow best practices.: The legend is clear, and the inclusion of a dotted diagonal line to represent chance-level performance is a standard and effective practice that provides an immediate baseline for interpretation.
  • 💡 The use of non-standard axes for the ROC curve could cause confusion.: Standard ROC curves plot Sensitivity (True Positive Rate) on the y-axis versus 1-Specificity (False Positive Rate) on the x-axis. This plot uses non-standard axes (Specificity vs. 1-Sensitivity). While mathematically valid, this unconventional representation can be confusing for readers accustomed to the standard format. It is recommended to either replot using the standard axes or explicitly state in the caption that a non-standard representation is being used.
  • 💡 Including AUC values in the legend would improve the figure's self-sufficiency.: While the reference text provides the AUC values, adding them directly to the legend in the figure (e.g., 'Flock (AUC = 0.99)') would make the plot more self-contained and immediately quantifiable for the reader.
Fig 10. Effect of JPEG image compression. When correct/incorrect responses were...
Full Caption

Fig 10. Effect of JPEG image compression. When correct/incorrect responses were nondifferentially reinforced (gray bars), pigeons' accuracy was affected proportionally to the compression level of the images shown.

Figure/Table Image (Page 14)
Fig 10. Effect of JPEG image compression. When correct/incorrect responses were nondifferentially reinforced (gray bars), pigeons' accuracy was affected proportionally to the compression level of the images shown.
First Reference in Text
As Fig 10 shows (gray bars), responses to the uncompressed, 15:1 compression, and 27:1 compression slides across 6 cycles of testing revealed an impact of compression level, with accuracies averaging 94%, 79%, and 73% correct, respectively; pairwise comparisons revealed reliable differences among all three levels of compression (all p values < .05).
Description
  • Bar Chart of Accuracy vs. Image Compression: This figure is a grouped bar chart that illustrates how image compression affects the accuracy of pigeons' classifications under two different feedback conditions. The vertical axis represents 'Percent correct' (accuracy), and the horizontal axis shows three levels of JPEG image compression: 'Uncompressed', '15:1', and '27:1'. Higher compression ratios mean smaller file sizes but more potential for image quality degradation.
  • Comparison of Feedback Conditions: At each compression level, two conditions are compared. The gray bars ('Nondifferential' reinforcement) represent a testing phase where pigeons received a reward regardless of their choice, meaning they got no feedback on whether they were right or wrong. The white bars ('Differential' reinforcement) represent a training phase where pigeons were only rewarded for correct answers, providing clear feedback.
  • Impact of Compression Without Feedback: The gray bars show a clear dose-dependent effect of compression on performance. With no feedback, accuracy was high for uncompressed images (averaging 94%) but dropped significantly as compression increased, to 79% for 15:1 and 73% for 27:1 compression.
  • Adaptation to Compression With Feedback: In stark contrast, the white bars show that when pigeons were given feedback, they could overcome the negative effects of compression. Their accuracy remained very high across all levels: 95% for uncompressed, 92% for 15:1, and 90% for 27:1. This demonstrates a remarkable ability to adapt to degraded visual information when properly trained.
Scientific Validity
  • ✅ The experimental design robustly isolates the effects of perception and learning.: The experimental design is very strong. By directly comparing performance under nondifferential and differential reinforcement, the authors cleverly dissociate the pure perceptual effect of image degradation from the animal's ability to learn and adapt to it. This is a methodologically elegant way to probe the limits of visual learning.
  • ✅ The study addresses a question of high practical relevance in medical imaging.: The findings have significant practical relevance for the field of digital medical imaging. The results suggest that observers (whether human or animal) can be trained to maintain high accuracy even with compressed images, which is an important consideration for developing image storage and transmission protocols in digital pathology and radiology.
  • ✅ The data strongly supports the authors' conclusions.: The data presented in the graph provides clear and compelling visual support for the conclusions stated in the caption and detailed in the reference text. The dramatic difference between the gray and white bars is unambiguous.
  • 💡 The number of subjects (n) is not reported in the figure.: The caption and legend are missing the number of subjects (n) included in the analysis. This information is critical for assessing the statistical power and generalizability of the findings and should be included for completeness.
  • 💡 The figure could be improved by adding statistical annotations.: The reference text mentions specific p-values for the comparisons. To make the figure more self-contained, it would be beneficial to add statistical significance annotations (e.g., asterisks or 'ns') directly to the graph to indicate which comparisons are statistically significant.
Communication
  • ✅ The choice of a grouped bar chart is highly effective.: The use of a grouped bar chart is an ideal visualization for this data. It allows for a direct, side-by-side comparison of the two reinforcement conditions (feedback vs. no feedback) at each level of image compression, making the central finding easy to grasp.
  • ✅ The figure effectively communicates the main experimental result.: The graph tells a very clear and compelling visual story. The steep decline of the gray bars versus the sustained height of the white bars immediately communicates the core message: compression impairs baseline performance, but this impairment can be overcome with training.
  • ✅ The labeling is clear and informative.: The axes and legend are clearly labeled, allowing the reader to easily understand the variables being plotted (accuracy, compression level, and reinforcement type).
  • 💡 Use of color could improve visual distinctiveness.: While the gray and white bars are distinct, using two different high-contrast, colorblind-friendly colors (e.g., a blue and an orange) instead of grayscale could enhance visual separation and improve accessibility.

Discussion

Key Aspects

Strengths

Suggestions for Improvement

Non-Text Elements

Fig 11. Results of training and testing with mammograms with or without...
Full Caption

Fig 11. Results of training and testing with mammograms with or without calcifications. A) Training quickly led to high levels of accuracy. B) The pigeons were able to generalize to novel images, but their performance on this task was not as good as their generalization to novel histology images (Fig 7), although still above chance levels of responding.

Figure/Table Image (Page 14)
Fig 11. Results of training and testing with mammograms with or without calcifications. A) Training quickly led to high levels of accuracy. B) The pigeons were able to generalize to novel images, but their performance on this task was not as good as their generalization to novel histology images (Fig 7), although still above chance levels of responding.
First Reference in Text
Birds were able to learn to classify images with and without clusters of microcalcifications as adeptly as they had mastered the initial histopathology challenge.
Description
  • Learning Curve for Microcalcification Detection: This line graph (Panel A) plots the learning progress of pigeons on the task of detecting microcalcifications in mammograms. The vertical y-axis represents their accuracy ('Percent correct'), while the horizontal x-axis shows the number of training 'Day's, up to 25.
  • Performance Improvement Over Time: The graph shows a typical learning curve. The pigeons' performance starts near 50% accuracy (equivalent to random guessing) and steadily increases over the first 14-15 days, after which it plateaus at a high level of accuracy, around 85%.
  • Indication of Performance Variability: The small vertical lines on each data point are error bars, which represent the variability in performance among the group of pigeons. The relatively small size of these bars suggests that the learning rate and final performance were fairly consistent across the subjects.
Scientific Validity
  • ✅ The graph provides strong evidence for learning on a clinically relevant task.: The data clearly demonstrates that pigeons can learn a clinically relevant and difficult visual task (detecting microcalcifications), supporting the central premise of the paper. The learning curve is robust and follows a classic acquisition pattern.
  • ✅ The data demonstrates robust learning despite changes in stimuli.: The reference text mentions a change in protocol (addition of rotated images) partway through training. The graph shows a slight dip around day 15, which may correspond to this, but the learning quickly recovers, demonstrating the robustness of the learned skill. This is a strong methodological component.
  • 💡 The number of subjects (n) is not reported in the figure.: The number of subjects (n) used to generate the average scores and error bars is not provided in the figure caption or legend. This is essential information for evaluating the statistical power and generalizability of the findings and should always be included.
Communication
  • ✅ The choice of a line graph is appropriate for the data.: The use of a line graph is the standard and most effective way to visualize learning acquisition over time. The upward trajectory of the line clearly communicates the pigeons' improving performance.
  • ✅ The graph is clearly labeled and easy to read.: The axes are clearly labeled ('Percent correct', 'Day'), and the data points are distinct, making the graph straightforward to interpret.
  • 💡 Adding a 'chance level' reference line would improve context.: The graph would benefit from a horizontal dotted line at the 50% mark to provide a clear visual reference for 'chance level' performance. This would make it easier to see when the pigeons' accuracy became statistically meaningful.
Fig 12. Results of training and testing with mammograms containing masses. A)...
Full Caption

Fig 12. Results of training and testing with mammograms containing masses. A) Pigeons required long training to discriminate between mammograms with masses, and even then, individual differences were pronounced. B) Regardless of their performance in the training phase, all of the pigeons failed to transfer their performance to novel exemplars, suggesting that their performance was based on rote memorization.

Figure/Table Image (Page 15)
Fig 12. Results of training and testing with mammograms containing masses. A) Pigeons required long training to discriminate between mammograms with masses, and even then, individual differences were pronounced. B) Regardless of their performance in the training phase, all of the pigeons failed to transfer their performance to novel exemplars, suggesting that their performance was based on rote memorization.
First Reference in Text
The birds did well on the first challenge (Fig 11, but poorly on the second (Fig 12).
Description
  • Individual Pigeon Performance Test: This panel (B) is a grouped bar chart that displays the final test performance for each of the four individual pigeons, identified on the x-axis (13Y, 42Y, 60Y, 75B). The y-axis shows their accuracy ('Percent correct').
  • Test of Generalization vs. Rote Memorization: For each pigeon, the chart compares accuracy on two types of images: the familiar 'Training' set (dark gray bar) they had learned over 80 days, and a completely novel 'Testing' set (light gray bar). This comparison is designed to distinguish between genuine understanding (generalization) and simple memorization.
  • Failure to Generalize to Novel Images: The results show a critical failure to generalize. The two pigeons that had learned the training set well (13Y and 60Y, with ~80% accuracy on training images) performed at chance level (~50% accuracy) on the new testing images. The same pattern holds for the moderately successful pigeon (42Y). This indicates that their training performance was based on memorizing specific images, not on learning a general rule for what malignant masses look like.
Scientific Validity
  • ✅ The null result is a scientifically important finding that defines the boundary conditions of the phenomenon.: This panel presents a crucial null result that defines the limits of the pigeons' visual classification abilities. Showing a failure under more difficult conditions is as scientifically important as showing success, and it provides a critical counterpoint to the positive results from the histology and microcalcification experiments.
  • ✅ The experimental design is robust for testing generalization.: The experimental design, directly comparing performance on familiar versus novel stimuli for each individual, is the correct and most rigorous method for testing generalization versus rote memorization.
  • ✅ The data strongly supports the conclusion of rote memorization.: The data presented provides unambiguous support for the authors' conclusion that the pigeons relied on rote memorization for this task. The collapse of performance on the testing set is visually and statistically clear.
  • 💡 The lack of error bars omits information about performance variability.: The data shown are averages for each bird over the testing period. The graph could have been strengthened by including error bars (e.g., standard deviation) to represent the day-to-day variability in each bird's performance during the testing phase.
Communication
  • ✅ The choice of a grouped bar chart is highly effective.: The grouped bar chart is the ideal visualization for this comparison. It allows for a direct, powerful, and intuitive comparison of performance on familiar ('Training') versus novel ('Testing') images for each individual bird.
  • ✅ The visual message is exceptionally clear and impactful.: The figure powerfully communicates its main finding: a failure to generalize. The stark contrast between the height of the 'Training' bars and the 'Testing' bars for the successful learners (13Y, 60Y) is visually striking and makes the conclusion of rote memorization unambiguous.
  • ✅ The figure's labeling is clear and effective.: The labels and legend are clear, allowing the reader to easily identify each bird and the two conditions being compared.
  • 💡 Adding a 'chance level' reference line would improve context.: The graph would be improved by adding a horizontal dotted line at the 50% accuracy level. This would provide an immediate visual benchmark for 'chance level' performance, making it even clearer that performance on the testing set collapsed to random guessing.
Fig 13. Conflictive histology exemplars. During Experiment 1, some exemplars...
Full Caption

Fig 13. Conflictive histology exemplars. During Experiment 1, some exemplars from a given category looked like exemplars from the other category causing the birds to incorrectly categorize them.

Figure/Table Image (Page 17)
Fig 13. Conflictive histology exemplars. During Experiment 1, some exemplars from a given category looked like exemplars from the other category causing the birds to incorrectly categorize them.
First Reference in Text
For example, the benign image posing the greatest difficulty (Fig 13, upper left panel) contained breast lobular structures that were indeed benign, but nevertheless highly cellular and densely packed; consequently, at low magnification, they could resemble sheets of cancer cells.
Description
  • Visual Error Analysis: This figure presents a visual analysis of specific errors made by the pigeons. It shows histology images (microscopic views of tissue) that were particularly challenging. The layout compares a 'Conflictive sample' (left column) with typical 'Opposite category samples' (middle and right columns).
  • Conflictive Benign Sample: The top row analyzes a difficult benign (non-cancerous) case. The image on the far left is the conflictive sample, which is truly benign. However, as the reference text explains, it is 'highly cellular and densely packed,' meaning it has an unusually high number of cells crowded together. This gives it a strong visual resemblance to the two examples on its right, which are typical malignant (cancerous) tissues.
  • Conflictive Malignant Sample: The bottom row analyzes a difficult malignant (cancerous) case. The image on the far left is the conflictive sample, which is truly malignant. However, it is described as being 'hypocellular' (having fewer cells) and containing 'duct-like structures' (organized formations resembling normal breast ducts). These features make it look visually similar to the two examples on its right, which are typical benign tissues.
Scientific Validity
  • ✅ The figure provides an insightful error analysis, strengthening the study's conclusions.: This figure represents a strong form of error analysis. By moving beyond overall accuracy scores to investigate the specific types of stimuli that caused confusion, the authors provide deeper insight into the perceptual strategies and limitations of their subjects. This adds significant depth to their findings.
  • ✅ The analysis provides evidence for systematic, feature-based errors, not random guessing.: The analysis demonstrates that the pigeons' errors were not random. Instead, they were systematic, occurring when an image from one category shared key visual features with the other category. This supports the idea that the pigeons were learning a rule-based classification based on visual texture and morphology.
  • ✅ The findings strengthen the validity of the pigeon as a model for human visual learning.: These 'conflictive' or 'atypical' cases are precisely the types of images that are challenging for human pathology trainees. By showing that pigeons are confused by the same ambiguous features, the authors strengthen their argument that pigeons can serve as a relevant model for certain aspects of human medical image perception.
  • 💡 The qualitative analysis could be supported by quantitative error rates for these specific images.: The analysis is qualitative. While visually compelling, it could be enhanced by providing quantitative data on these specific exemplars. For instance, reporting the specific error rate for each of these conflictive images across the flock would add quantitative weight to the claim that these were indeed the 'most difficult' samples.
Communication
  • ✅ The comparative layout is highly intuitive and effective.: The figure's layout is exceptionally effective. By placing a 'conflictive' sample directly adjacent to examples of the 'opposite category' it was mistaken for, the figure provides an immediate and intuitive visual comparison that clearly explains the source of the pigeons' errors.
  • ✅ The figure is well-labeled.: The labeling of the columns ('Conflictive sample', 'Opposite category samples') and rows ('Benign', 'Malignant') is clear and crucial for understanding the logic of the figure. It guides the reader through the error analysis effectively.
  • 💡 The lack of annotations makes it difficult to identify the specific conflictive features.: The figure's message would be significantly enhanced with annotations. For example, adding arrows to point out the 'highly cellular' regions in the top-left image or outlining the misleading 'duct-like structures' in the bottom-left image would make the authors' points from the text visually explicit and more accessible to a non-expert audience.